Skip to content

stream: experimental stream/iter implementation#62066

Open
jasnell wants to merge 56 commits intonodejs:mainfrom
jasnell:jasnell/new-streams-prototype
Open

stream: experimental stream/iter implementation#62066
jasnell wants to merge 56 commits intonodejs:mainfrom
jasnell:jasnell/new-streams-prototype

Conversation

@jasnell
Copy link
Member

@jasnell jasnell commented Mar 1, 2026

Opening this for discussion. Not intending to land this yet. It adds an implementation of the "new streams" to core and adds support to FileHandle with tests and benchmarks just to explore implementation feasibility, performance, etc.

This is an implementation of the "new streams" API for Node.js along with an example integration with FileHandle. This covers the core part of the implementation.

The module is stream/iter. It is gated behind the --experimental-stream-iter CLI flag.

Benchmark results comparing Node.js streams, Web streams, and stream/iter (higher number is better)

Benchmark classic webstream iter iter-sync iter vs classic
Identity 1MB 1,245 582 3,110 16,658 2.5x
Identity 64MB 31,410 14,980 33,894 62,111 1.1x
Transform 1MB 287 227 325 327 1.1x
Transform 64MB 595 605 605 573 1.0x
Compression 1MB 123 98 110 -- 0.9x
Compression 64MB 329 303 308 -- 0.9x
pipeTo 1MB 1,137 494 2,740 13,611 2.4x
pipeTo 64MB 22,081 15,377 30,036 60,976 1.4x
Broadcast 1c 1MB 1,365 521 1,991 -- 1.5x
Broadcast 2c 1MB 1,285 439 1,962 -- 1.5x
Broadcast 4c 1MB 1,217 322 750 -- 0.6x
File read 16MB 1,469 537 1,639 -- 1.1x

It's worth noting that the performance of the FileHandle benchmarked added, that reads files, converts them to upper case and then compresses them, is on par with node.js streams and twice as fast as web streams. (tho... web streams are not perf optimized in any way so take that 2x with a grain of salt). The majority of the perf cost in the benchmark is due to compression overhead. Without the compression transform, the new stream can be up to 15% faster than reading the file with classic node.js streams.

The main thing this shows is that the new streams impl can (a) perform reasonably and (b) sit comfortably alongside the existing impls without any backwards compat concerns.

Benchmark runs:

fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.4520276595366672
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5974527572097321
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6425952035725405
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.1911778984563999
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2179878501077266
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2446390516960688
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.5118129753083176
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.6280697056085692
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.596177892010514
--- 
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.44890689503274533
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5922959407897667
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6151916200977057
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.22796906713941217
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2517499148269662
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2613608248108332
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.4725187688512099
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.5180217625521253
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.616770183722841

Opencode/Opus 4.6 were leveraged heavily in the process of creating this PR following a strict iterative jasnell-in-the-loop process.

--
Reviewing Guide

The draft spec this is implementing is located at https://stream-iter.jasnell.me/

The implementation is primarily in lib/internal/streams/iter ... that's where you should start. The functionality is split between key files by operation, which should make it easier to review.

The tests are in parallel prefixed as test-stream-iter-*, they are organized also by functional area.

The are benchmarks in bench/streams prefixed with iter-*.

@jasnell jasnell requested review from mcollina and ronag March 1, 2026 18:37
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/performance
  • @nodejs/streams

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 1, 2026
mohityadav8

This comment was marked as off-topic.

Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super impressed! This is amazing.

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

@ronag
Copy link
Member

ronag commented Mar 2, 2026

Also could you do some mitata based benchmarks so that we can see the gc and memory pressure relative to node streams?

@ronag
Copy link
Member

ronag commented Mar 2, 2026

Another thing, in the async generator case, can we pass an optional AbortSignal? i.e. async function * (src, { signal }). We maybe could even check the function signature and if it doesn't take a second parameter don't allocate the abort controller at all.

@jasnell
Copy link
Member Author

jasnell commented Mar 2, 2026

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

This makes me a bit nervous for code portability. If some one starts working with this in node.js, they would end up writing code that depends on the values being Buffer and not just Uint8Array. They go to move that to another runtime implementation or standalone impl like https://github.com/jasnell/new-streams and suddenly that assumption breaks.

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to explore implementation feasibility, performance, etc

Sounds fine as this isn't exposed outside at the time


// Buffer is full
switch (this._backpressure) {
case 'strict':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure strict should be the default and not block here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll be a big part of the discussion around this. A big part of the challenge with web streams is that backpressure can be fully ignored. One of the design principles for this new approach is to apply it strictly by default. We'll need to debate this. Recommend opening an issue at https://github.com/jasnell/new-streams

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry meant to approve, regardless of design changes/suggestions regarding timing and a lot of other stuff as experimental this is fine.

I would maybe update the docs to emphasize the experimentality even further than normal

@jasnell
Copy link
Member Author

jasnell commented Mar 3, 2026

@ronag ... implemented a couple of mitata benchmarks in the https://github.com/jasnell/new-streams repo (the reference impl)

--

Memory Benchmark Results

Environment: Node 25.6.0, Intel Xeon w9-3575X, --expose-gc, mitata with .gc('inner')

Per-Operation Allocations (New Streams vs Web Streams)

Scenario Speed Heap/iter (new) Heap/iter (web)
Push write/read (1K x 4KB) 2.24x faster 2.06 MB 1.43 MB
Pull + transform (1K x 4KB) 2.44x faster 334 KB 5.57 MB
pipeTo + transform (1K x 4KB) 3.15x faster 303 KB 7.47 MB
Broadcast 2 consumers (500 x 4KB) 1.04x faster 1.92 MB 1.81 MB
Large pull 40MB (10K x 4KB) 1.26x faster 2.62 MB 52.35 MB

Pipeline scenarios (pull, pipeTo) show the biggest gains: 16-25x less heap because transforms are inline function calls, not stream-to-stream pipes with internal queues. Push is faster but uses slightly more heap due to batch iteration (Uint8Array[]). Broadcast/tee are comparable at this scale.

Sustained Load (97.7 MB volume)

Scenario Peak Heap (new) Peak Heap (web stream)
pipeTo + transform 6.9 MB 50.6 MB
Broadcast 2 consumers 0.5 MB 42.8 MB
Push write/read 5.9 MB 2.5 MB
Pull + transform 6.1 MB 2.8 MB

pipeTo and broadcast show the largest sustained-load heap difference. Web Streams' pipeThrough chain buffers ~50% of total volume in flight; new streams' pipeTo pulls synchronously through the transform. Broadcast's shared ring buffer (0.5 MB) vs tee's per-branch queues (42.8 MB).

Zero retained memory for both APIs after completion -- no leaks.

@jedwards1211
Copy link

jedwards1211 commented Mar 3, 2026

@ronag passing a signal to an async generator allows the underlying source to abort it, but we're lacking a builtin way for the consumer iterating the async generator to safely cancel the stream. It can .return() its iterator when it's done, but that won't break the async generator and until it receives the next chunk, which isn't guaranteed to happen if the underlying source is something nondeterministic like pubsub events. In this case, there would be leaks that are kind of awkward to blame on user error.

Barring an improvement at the language level, the consumer can only safely cancel the underlying source if it has a reference to an AbortController that signals it.

WHATWG Streams don't have this problem if the consumer .cancel()s their reader, though they do if the consumer is async iterating them.

Happy to create examples to reproduce this if it's not clear what I'm talking about.

@ronag
Copy link
Member

ronag commented Mar 3, 2026

I think you misunderstand. The signal would be for any async calls inside the generator.

@jedwards1211
Copy link

jedwards1211 commented Mar 3, 2026

Yes, I'm just saying that doesn't allow the consumer to abort calls the async generator is making, but the consumer often decides when streaming should be aborted.

For example say I'm using a library that handles subscriptions from the frontend. When it gets a subscription it asks me to build an async iterable of events to stream back. Then it's responsible for iterating, then cancelling once the frontend unsubscribes. If the iterable I pass to that library is from an async generator, I'll have to also pass an AbortController to that library for it to safely clean up once the client unsubscribes. If all it has is an AsyncIterable interface, it may leak resources after the client unsubscribes.

This is a fundamental weakness in using async generators for transformation and my longtime frustration with async iteration in general.

In contrast, with WHATWG streams, when a consumer cancels its reader, the underlying source and any TransformStreams and get notified to clean up right away.

@jedwards1211
Copy link

jedwards1211 commented Mar 3, 2026

@benjamingr was actually talking about the same thing I'm trying to resurrect awareness of in this old issue in the async-iteration proposal

Note one of his comments: tc39/proposal-async-iteration#126 (comment)

This was eight years ago but there hasn't been much improvement on this front, unfortunately.

I'm really hoping I can get everyone to fully understand this pitfall and have a good plan for how to help people avoid it before getting too far along with this new proposed API.

@jasnell jasnell force-pushed the jasnell/new-streams-prototype branch from 9f8af01 to e1e1911 Compare March 3, 2026 17:07
@jasnell jasnell changed the title [DRAFT] stream: prototype for new stream implementation stream: experimental stream/iter implementation Mar 18, 2026
@nodejs-github-bot

This comment was marked as outdated.

@jasnell
Copy link
Member Author

jasnell commented Mar 18, 2026

I've updated the implementation to address the remaining outstanding issues, round out tests, add benchmarks, fix bugs, etc. It's also now behind an experimental cli flag.

This is ready for review.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 89.66399% with 606 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.66%. Comparing base (9fc6b64) to head (31d5558).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/streams/iter/pull.js 83.24% 146 Missing and 6 partials ⚠️
lib/internal/streams/iter/broadcast.js 84.64% 114 Missing and 3 partials ⚠️
lib/internal/streams/iter/share.js 84.17% 101 Missing and 2 partials ⚠️
lib/internal/streams/iter/from.js 89.08% 63 Missing ⚠️
lib/internal/streams/iter/push.js 91.40% 59 Missing and 3 partials ⚠️
lib/internal/fs/promises.js 81.37% 54 Missing ⚠️
lib/internal/streams/iter/consumers.js 96.55% 18 Missing ⚠️
lib/internal/streams/iter/ringbuffer.js 88.74% 17 Missing ⚠️
lib/internal/streams/iter/transform.js 97.35% 13 Missing and 2 partials ⚠️
lib/internal/streams/iter/duplex.js 96.45% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #62066      +/-   ##
==========================================
- Coverage   89.68%   89.66%   -0.02%     
==========================================
  Files         676      688      +12     
  Lines      206575   212553    +5978     
  Branches    39549    40713    +1164     
==========================================
+ Hits       185262   190592    +5330     
- Misses      13446    14080     +634     
- Partials     7867     7881      +14     
Files with missing lines Coverage Δ
lib/internal/bootstrap/realm.js 96.21% <100.00%> (ø)
lib/internal/process/pre_execution.js 97.47% <100.00%> (+0.54%) ⬆️
lib/internal/streams/iter/types.js 100.00% <100.00%> (ø)
lib/internal/streams/iter/utils.js 100.00% <100.00%> (ø)
lib/stream/iter.js 100.00% <100.00%> (ø)
src/node_builtins.cc 76.00% <100.00%> (-0.15%) ⬇️
src/node_options.cc 76.47% <100.00%> (+0.02%) ⬆️
src/node_options.h 97.94% <100.00%> (+0.01%) ⬆️
lib/internal/streams/iter/duplex.js 96.45% <96.45%> (ø)
lib/internal/streams/iter/transform.js 97.35% <97.35%> (ø)
... and 8 more

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jasnell
Copy link
Member Author

jasnell commented Mar 20, 2026

Performed some memory profiling comparing stream/iter with "classic" node.js streams. This is based on the current iteration of stream/iter at the time this comment is posted ... expand the details block for the information.

Details

Methodology

Six benchmarks comparing classic Node.js streams (stream.Readable/stream.Writable/pipeline) against the new stream/iter API across representative usage patterns. All benchmarks run with --expose-gc for forced GC before/after measurement, and use PerformanceObserver to capture GC event counts, types, and pause durations.

Each benchmark runs warmup iterations first, then measured iterations with memory snapshots (via process.memoryUsage() and v8.getHeapStatistics()) taken before and after the measured window.


Benchmark Results

1. Simple Pipe-Through

Scenario: 64MB of 64KB chunks piped from source to no-op sink. 10 iterations.
Measures baseline piping overhead without transforms.

Metric Classic Iter Iter-Sync Iter vs Classic
Time 20.6ms 9.8ms 3.6ms 2.1x faster
Heap delta 104.1 KB 58.8 KB 30.1 KB 44% less
RSS delta 1.38 MB 704 KB 0 B 49% less
GC events 2 13 2 More minor GCs
GC pause 3.26ms 3.48ms 2.26ms Similar

Analysis: Iter uses less heap per run due to the absence of ReadableState, WritableState, _events, and EventEmitter infrastructure. The higher minor GC count for iter-async reflects more frequent short-lived promise allocations from the async generator protocol, but each is cheap and efficiently collected by Scavenge. Iter-sync eliminates nearly all allocation since generators produce no promises.


2. SSR-Type Concurrent Streams

Scenario: 100 concurrent streams, each producing 100KB through a buffer-copy transform with 4KB chunks. 5 iterations. Simulates server-side rendering workloads.

Metric Classic Iter Iter vs Classic
Time 87.2ms 33.2ms 2.6x faster
Heap delta 16.1 KB -10 KB Neutral
RSS delta 25.44 MB 6.88 MB 73% less RSS
GC events 6 3 50% fewer
GC pause 9.03ms 4.18ms 54% less

Analysis: The most dramatic result. Classic streams allocate approximately 25 objects per stream (Readable + Writable + ReadableState + WritableState + _events x 2 + pipeline scaffolding + end-of-stream listeners + closures). With 100 concurrent streams x 5 iterations = 500 stream lifecycles, that is roughly 12,500 objects from infrastructure alone. Iter avoids all of this as generators and plain objects have near-zero construction overhead. The 73% RSS reduction is significant for server workloads.


3. Backpressure

Scenario: 8MB of 16KB chunks with a slow consumer that delays every 8th write by 1ms. 5 iterations. Tests buffer memory growth under sustained pressure.

Metric Classic Iter-Push Iter-Pull
Time 348.7ms 361.3ms 2.3ms
Heap delta 133.1 KB 7.7 KB 17.5 KB
RSS delta 0 B 0 B 0 B
GC events 2 3 4
GC pause 3.70ms 5.46ms 2.76ms

Analysis: Under backpressure, classic streams buffer {chunk, encoding, callback} objects per write in the Writable's internal buffer -- 133KB of heap growth. Iter-push uses a RingBuffer with just the chunk reference -- 7.7KB, 95% less buffer overhead. The pull model avoids buffering entirely because the source only yields when the consumer requests data.

The iter-pull result (2.3ms) reflects the writeSync try-fallback pattern: 7 out of 8 writes complete synchronously, entirely avoiding the setTimeout delay. This is a valid demonstration of the pull model's advantage -- the sync fast path avoids async overhead when the writer can handle data synchronously.


4. Many Short-Lived Streams

Scenario: 10,000 streams, each producing 4KB (4 x 1KB chunks). 3 iterations. Measures per-stream construction/teardown overhead and GC pressure from rapid allocation and deallocation.

Metric Classic Iter Iter-Sync Iter vs Classic
Time 468.7ms 110.6ms 81.7ms 4.2x faster
Heap delta 68.7 KB 75.8 KB 414.4 KB Similar
RSS delta 20.54 MB -504 KB 1.38 MB Dramatically less
GC events 23 8 3 65% fewer
GC pause 87.73ms 8.04ms 9.35ms 91% less

Analysis: The most revealing benchmark for construction/teardown overhead. Classic streams at 10,000 stream lifecycles triggers 23 GC events totaling 87.73ms -- 18.7% of total runtime is GC. Each classic stream creates approximately 25 objects (Readable, Writable, two States, _events, pipeline closures, end-of-stream listeners). Over 10K x 3 iterations x 25 = roughly 750,000 objects created and immediately abandoned. Iter creates a generator + plain object per stream, perhaps 3-4 objects. GC pause drops from 87.73ms to 8.04ms.


5. Deep Transform Chain

Scenario: 16MB of 16KB chunks through 5 identity transforms to a no-op sink. 5 iterations. Measures per-transform memory overhead.

Metric Classic Iter Iter-Sync Iter vs Classic
Time 14.9ms 10.6ms 3.5ms 1.4x faster
Heap delta 77.0 KB 25.4 KB -32 B 67% less
RSS delta 1.38 MB 0 B 0 B No RSS growth
GC events 6 9 1
GC pause 4.88ms 3.29ms 2.32ms 33-52% less

Analysis: Each classic Transform is a full Duplex stream (Readable + Writable + both States + _events + pipeline wiring) -- 5 transforms means roughly 125 infrastructure objects. Iter fuses all 5 stateless transforms into a single generator layer -- one additional generator frame regardless of depth. The sync path achieves zero net heap growth.


6. Fan-Out

Scenario: 1 source producing 16MB of 16KB chunks consumed by 4 readers. 5 iterations. Classic uses pipe() to PassThrough streams; iter uses broadcast.

Metric Classic Broadcast Iter vs Classic
Time 9.6ms 11.6ms 21% slower
Heap delta 132.8 KB 14.7 KB 89% less
RSS delta 2.06 MB 704 KB 66% less
GC events 5 9 More minor GCs
GC pause 2.81ms 2.90ms Similar

Analysis: Fan-out is the one scenario where classic streams are faster. pipe() to multiple PassThrough streams is highly optimized in Node.js. However, broadcast uses dramatically less memory: 14.7KB vs 132.8KB heap delta. Classic creates 4 PassThrough streams (each a full Duplex) + pipe wiring + end-of-stream listeners. Broadcast shares a single RingBuffer with cursor-based consumers where each consumer is just a {cursor, resolve, reject, detached} state object.


Structural Comparison

Per-Stream Construction Cost

Component Classic Streams stream/iter
Stream object Readable/Writable instance Generator function (no object)
State tracking ReadableState + WritableState (~40 fields each) Closure variables (zero objects)
Event system _events object + listener arrays None
Buffer Array + index + compaction logic RingBuffer (power-of-2 backing array)
Pipeline wiring AbortController + eos listeners + destroyer closures (15-20 objects) for await loop (0 objects)
Backpressure state awaitDrainWriters + drain listeners + cork/uncork RingBuffer capacity check (0 objects)
Total per stream ~25-30 heap objects ~3-5 heap objects

Per-Chunk/Per-Batch Hot Path Cost

Cost Classic stream/iter
Iterator result N/A (push model) 1 {done, value} per batch
Promise per chunk 0 (callback-based) 1 per batch (async gen yield)
Backpressure buffering 1 {chunk, enc, cb} object 0 (pull model) or 1 chunk ref (push)
Write completion afterWriteTickInfo batching kResolvedPromise cached (0 alloc)
Batch amortization 1 chunk per event loop tick N chunks per batch (configurable)

GC Impact Model

The fundamental difference: classic streams create many long-lived objects that survive young generation collection (State objects, _events, listener closures live for the stream's entire lifetime). This promotes them to old space, requiring expensive Mark-Sweep collection.

Stream/iter creates mostly short-lived objects (iterator results, promises) that die within one async tick and are efficiently collected by Scavenge (minor GC).

For high-churn scenarios (many short-lived streams), classic streams create and abandon ~25 objects per stream lifecycle that must all be traced and collected. Iter creates ~3-5 objects. The GC impact scales linearly with stream count -- at 10K streams, classic spends 88ms in GC vs iter's 8ms.


Summary

Scenario Time Winner Memory Winner GC Winner
Simple pipe-through iter (2.1x) iter (44% less) Comparable
SSR concurrent (100 streams) iter (2.6x) iter (73% less RSS) iter (54% less)
Backpressure pull (150x) push (95% less) Comparable
Many short streams (10K) iter (4.2x) iter (less RSS) iter (91% less)
Deep transforms (5x) iter (1.4x) iter (67% less) iter (33% less)
Fan-out (4 consumers) classic (1.2x) broadcast (89% less) Comparable

Stream/iter is consistently more memory-efficient across all scenarios, with the advantage most pronounced in high-concurrency and high-churn workloads. The only throughput concession is fan-out, where classic's highly-optimized pipe() path is slightly faster -- but even there, broadcast uses dramatically less memory.

The pull model's inherent backpressure (source only yields on demand) eliminates buffer-related memory growth entirely for the most common use pattern. The batch-oriented design (Uint8Array[] rather than individual chunks) amortizes the per-item overhead of Promises and iterator results across all chunks in a batch, making the async iteration protocol overhead negligible at typical chunk sizes.

@jasnell
Copy link
Member Author

jasnell commented Mar 20, 2026

Similar comparison with stream/iter to web streams (noting that our web streams impl has never been fully optimized).

Details

Methodology

Six benchmarks comparing the Web Streams API (ReadableStream/WritableStream/TransformStream/pipeTo/pipeThrough/tee) against the new stream/iter API across representative usage patterns. All benchmarks run with --expose-gc for forced GC before/after measurement, and use PerformanceObserver to capture GC event counts, types, and pause durations.

Each benchmark runs warmup iterations first, then measured iterations with memory snapshots (via process.memoryUsage() and v8.getHeapStatistics()) taken before and after the measured window.


Benchmark Results

1. Simple Pipe-Through

Scenario: 64MB of 64KB chunks piped from source to no-op sink. 10 iterations.

Metric Web Streams Iter Iter-Sync Iter vs WS
Time 28.8ms 6.6ms 3.3ms 4.4x faster
Heap delta 142.6 KB 13.8 KB 30.5 KB 90% less
GC events 31 7 1 77% fewer
GC pause 4.01ms 3.06ms 2.20ms 24% less
Minor GC 30 6 0 80% fewer

Analysis: Web Streams trigger 30 minor GCs for the same data volume -- one roughly every 21MB of throughput. Each pipeTo iteration creates internal reader/writer pairs, promise-per-chunk from the pull protocol, and controller state. Iter-async needs only 6 minor GCs (one promise per batch from the async generator). Iter-sync eliminates minor GC entirely. The 90% heap reduction reflects iter's absence of ReadableStreamDefaultController, WritableStreamDefaultController, ReadableStreamDefaultReader, internal queuing strategy objects, and the per-chunk promise overhead of the Web Streams pull protocol.


2. SSR-Type Concurrent Streams

Scenario: 100 concurrent streams, each producing 100KB through a buffer-copy transform with 4KB chunks. 5 iterations.

Metric Web Streams Iter Iter vs WS
Time 132.7ms 26.7ms 5.0x faster
Heap delta 191.1 KB 50.8 KB 73% less
RSS delta 30.01 MB 0 B No RSS growth
GC events 10 3 70% fewer
GC pause 9.55ms 3.48ms 64% less

Analysis: The most dramatic result. At 100 concurrent streams, Web Streams consume 30MB of RSS growth while iter shows none. Each Web Streams pipeline creates ReadableStream + TransformStream + WritableStream, each with their own controllers, internal queues, [[readableStreamController]] / [[writableStreamDefaultWriter]] internal slots, strategy objects, and promise machinery. That is approximately 30-40 objects per stream. 100 streams x 5 iterations = 500 lifecycles producing roughly 15,000-20,000 infrastructure objects. Iter produces a generator + plain objects per stream -- approximately 3-5 objects.


3. Backpressure

Scenario: 8MB of 16KB chunks with a slow consumer that delays every 8th write by 1ms. 5 iterations.

Metric Web Streams Iter-Push Iter-Pull
Time 351.1ms 366.2ms 2.7ms
Heap delta 28.1 KB 21.4 KB 6.4 KB
RSS delta 6.19 MB 0 B 1.38 MB
GC events 10 3 2
GC pause 3.83ms 4.04ms 2.22ms
Minor GC 9 2 1

Analysis: Under backpressure with equivalent delay patterns, Web Streams and iter-push perform similarly on time (both dominated by the 1ms delays). However, Web Streams show 6.19MB RSS growth vs zero for iter-push. Web Streams' internal queuing strategy allocates queue entries with { value, size } wrappers and maintains separate [[queue]] arrays on both the readable and writable sides. Iter-push uses a single RingBuffer with direct chunk references.

The pull model (2.7ms) demonstrates its structural advantage: the writeSync try-fallback pattern means 7/8 writes complete synchronously, entirely skipping the delay. The Web Streams pull protocol has no sync fast path -- every controller.enqueue() / reader.read() goes through the promise-based pull protocol.


4. Many Short-Lived Streams

Scenario: 10,000 streams, each producing 4KB (4 x 1KB chunks). 3 iterations.

Metric Web Streams Iter Iter-Sync Iter vs WS
Time 326.1ms 98.9ms 89.2ms 3.3x faster
Heap delta 371.5 KB 99.0 KB 424.7 KB 73% less (async)
GC events 268 36 5 87% fewer
GC pause 20.65ms 11.05ms 12.04ms 47% less
Minor GC 267 35 4 87% fewer

Analysis: The GC pressure difference is staggering. Web Streams trigger 267 minor GCs for 30,000 stream lifecycles (10K x 3 iterations) -- nearly one Scavenge per 112 streams. Each ReadableStream + WritableStream pair creates controllers, internal slot objects, queuing strategy instances, and the pipeTo algorithm creates ReadableStreamDefaultReader + WritableStreamDefaultWriter with their own promise slots. At approximately 30+ objects per stream pair, that is 900,000+ objects created and abandoned.

Iter-async triggers only 36 minor GCs (87% fewer) because each stream lifecycle creates roughly 3-5 objects (generator, iterator object, argument parsing result). The GC pause time difference (20.65ms vs 11.05ms) means Web Streams spend 6.3% of total runtime in GC vs iter's 11.2%. While iter's GC percentage is higher, its total runtime is 3.3x shorter, so the absolute time in GC is still lower.


5. Deep Transform Chain

Scenario: 16MB of 16KB chunks through 5 identity transforms to a no-op sink. 5 iterations.

Metric Web Streams Iter Iter-Sync Iter vs WS
Time 69.8ms 9.6ms 2.9ms 7.3x faster
Heap delta 190.4 KB -53 KB 30.1 KB Net negative
GC events 60 9 1 85% fewer
GC pause 8.95ms 3.28ms 2.70ms 63% less
Minor GC 59 8 0 86% fewer

Analysis: The widest performance gap. Each Web Streams TransformStream creates a ReadableStream + WritableStream pair internally (with all their controllers and internal slots), plus the transform controller. 5 transforms = 5 full stream pairs = approximately 150+ infrastructure objects, and every chunk passes through 5 promise-based pull cycles. The pipeThrough chain creates 5 pipeTo algorithms running concurrently, each with its own reader/writer pair.

Iter fuses all 5 stateless transforms into a single generator layer. One additional generator frame regardless of transform depth. The per-chunk cost is 5 function calls (the transforms) with no additional promise or object creation. 59 minor GCs for Web Streams vs 8 for iter reflects the massive object creation difference.

Iter-sync achieves 2.9ms with zero minor GC -- the entire pipeline runs as synchronous function calls through a single generator.


6. Fan-Out

Scenario: 1 source producing 16MB of 16KB chunks consumed by 4 readers. 5 iterations.

Metric Web Streams Broadcast Iter vs WS
Time 35.3ms 11.3ms 3.1x faster
Heap delta 124.3 KB 3.5 KB 97% less
RSS delta 3.44 MB 704 KB 80% less
GC events 41 9 78% fewer
GC pause 4.98ms 2.81ms 44% less
Minor GC 40 8 80% fewer

Analysis: Web Streams' tee() creates a full branch of the stream for each split. To get 4 consumers from tee(), two levels of teeing are needed (rs.tee() then tee each branch), creating 6 ReadableStream instances total with their controllers, internal queues, and per-chunk promise resolution. Each tee branch independently copies chunk references and maintains separate queue state.

Broadcast shares a single RingBuffer across all consumers. Each consumer is a {cursor, resolve, reject, detached} state object -- 4 objects vs Web Streams' 6 full stream instances with controllers. The 97% heap reduction reflects this fundamental architectural difference: shared buffer with cursors vs independent queue copies.


Structural Comparison

Per-Stream Construction Cost

Component Web Streams stream/iter
Readable side ReadableStream + ReadableStreamDefaultController + strategy + [[queue]] Generator function (no object)
Writable side WritableStream + WritableStreamDefaultController + strategy + [[queue]] Plain writer object (user-provided)
Pipe connection ReadableStreamDefaultReader + WritableStreamDefaultWriter + promise slots for await loop (0 objects)
Transform TransformStream = RS + WS + TransformStreamDefaultController Single function reference
Queuing strategy CountQueuingStrategy or ByteLengthQueuingStrategy instance Integer HWM on RingBuffer
Backpressure tracking [[backpressure]] slot + desiredSize on controller + promise-based signaling RingBuffer capacity check
Total per pipeline ~30-40 heap objects ~3-5 heap objects

Per-Chunk Protocol Cost

Cost Web Streams stream/iter
Read protocol reader.read() returns Promise wrapping {done, value} for await gets batch from generator
Write protocol writer.write() returns Promise, queued internally writeSync() returns boolean (0 alloc)
Backpressure signal Promise-based: writer.ready resolves when space available Sync: RingBuffer length check
Transform per-chunk controller.enqueue() through full RS/WS queue machinery Direct function call, return value
Per-chunk promises 2+ promises minimum (read + write) 0-1 promise (batch amortized)

GC Pressure Model

Web Streams create many medium-lived objects per pipeline: controllers, readers, writers, strategy objects, and internal queue entries. These objects live for the duration of the stream but are recreated for each new stream instance. The per-chunk promise overhead (2+ promises per chunk from the read/write protocol) generates significant young-generation pressure.

Stream/iter creates few short-lived objects per batch: one iterator result and one promise from the async generator yield. The writeSync fast path eliminates write-side promise creation entirely. Batch amortization means the per-chunk overhead is divided by batch size.

The GC data across all 6 benchmarks:

Benchmark Web Streams Minor GCs Iter Minor GCs Reduction
Pipe-through 30 6 80%
SSR (100 streams) 9 2 78%
Backpressure 9 2 78%
Many streams (10K) 267 35 87%
Deep transforms (5) 59 8 86%
Fan-out (4) 40 8 80%
Average 82%

Summary

Scenario Time (Iter vs WS) Memory Winner GC Winner
Simple pipe-through 4.4x faster iter (90% less heap) iter (77% fewer)
SSR concurrent (100 streams) 5.0x faster iter (73% less heap, no RSS growth) iter (70% fewer)
Backpressure Comparable (push); 130x faster (pull) iter (no RSS growth) iter (70% fewer)
Many short streams (10K) 3.3x faster iter (73% less heap) iter (87% fewer)
Deep transforms (5x) 7.3x faster iter (net negative heap) iter (85% fewer)
Fan-out (4 consumers) 3.1x faster iter (97% less heap) iter (78% fewer)

Stream/iter outperforms Web Streams on every metric across every scenario tested. The advantages are structural:

  1. No controller/reader/writer object overhead. Web Streams require ReadableStreamDefaultController, ReadableStreamDefaultReader, WritableStreamDefaultController, WritableStreamDefaultWriter, and queuing strategy instances per stream. Iter uses generators and plain objects.

  2. No per-chunk promise tax. Web Streams' pull protocol requires at minimum 2 promises per chunk (reader.read() + writer.write()). Iter's batch model amortizes one promise per batch across all chunks, and the writeSync fast path eliminates the write-side promise entirely.

  3. Transform fusion. Web Streams' pipeThrough creates a full ReadableStream + WritableStream pair per transform. Iter fuses consecutive stateless transforms into a single generator layer regardless of depth.

  4. Shared-buffer fan-out. Web Streams' tee() creates independent stream branches with separate queues. Broadcast shares a single RingBuffer with cursor-based consumers.

The result is an average 82% reduction in GC events, with the gap widest in transform-heavy and high-churn workloads where Web Streams' object-per-stream and promise-per-chunk costs compound.

@jasnell
Copy link
Member Author

jasnell commented Mar 21, 2026

Ok... with the latest round of test coverage updates, the initial development on this is done. Just waiting for code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

experimental Issues and PRs related to experimental features. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. semver-minor PRs that contain new features and should be released in the next minor version. stream Issues and PRs related to the stream subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants